จากโมเดลปัญญาประดิษฐ์เฉพาะงานสู่โมเดลภาษาขนาดใหญ่ทั่วไป

การเปลี่ยนแปลงแนวคิดในด้านปัญญาประดิษฐ์

1. จากเฉพาะเจาะจงไปสู่ทั่วไป

ด้านปัญญาประดิษฐ์ได้ผ่านการเปลี่ยนแปลงอย่างรุนแรงในวิธีการฝึกอบรมและนำไปใช้งานโมเดล

แนวทางเดิม (การฝึกอบรมเฉพาะงาน):โมเดลเช่น CNN รุ่นแรกหรือ BERT ถูกฝึกเพื่อเป้าหมายเฉพาะหนึ่ง (เช่น การวิเคราะห์อารมณ์เท่านั้น) คุณจำเป็นต้องใช้โมเดลอื่นสำหรับการแปล สรุปเนื้อหา เป็นต้น
แนวทางใหม่ (การฝึกเบื้องต้นแบบรวมศูนย์ + การให้คำแนะนำ):โมเดลขนาดใหญ่เพียงตัวเดียว (LLM) เรียนรู้ความรู้ทั่วไปของโลกจากชุดข้อมูลระดับอินเทอร์เน็ต ซึ่งสามารถนำทางให้ทำหน้าที่ทางภาษาใดๆ ได้เกือบทุกอย่างเพียงแค่เปลี่ยนคำแนะนำป้อนเข้า

2. การพัฒนาสถาปัตยกรรม

เฉพาะตัวเข้ารหัส (ยุคของ BERT):เน้นการเข้าใจและการจัดหมวดหมู่ โมเดลเหล่านี้อ่านข้อความแบบสองทิศทางเพื่อเข้าใจบริบทอย่างลึกซึ้ง แต่ไม่ได้ออกแบบมาเพื่อสร้างข้อความใหม่
เฉพาะตัวถอดรหัส (ยุคของ GPT/Llama):มาตรฐานสมัยใหม่สำหรับปัญญาประดิษฐ์ที่สร้างสรรค์ โมเดลเหล่านี้ใช้การจำลองแบบอัตโนมัติเพื่อคาดการณ์คำถัดไป จึงเหมาะกับการสร้างเนื้อหาแบบไม่มีข้อจำกัดและการสนทนา

3. ปัจจัยหลักที่เปลี่ยนแปลง

การเรียนรู้แบบอัตโนมัติ:การฝึกอบรมบนข้อมูลอินเทอร์เน็ตที่ไม่มีการระบุประเภทจำนวนมาก ช่วยกำจัดจุดตันของการติดป้ายข้อมูลโดยมนุษย์
กฎการขยายตัว:การสังเกตเชิงประจักษ์ที่แสดงว่าประสิทธิภาพของปัญญาประดิษฐ์จะเพิ่มขึ้นอย่างคาดเดาได้ตามขนาดของโมเดล (พารามิเตอร์) ปริมาณข้อมูล และพลังการประมวลผล

ข้อคิดสำคัญ

ปัญญาประดิษฐ์ได้เปลี่ยนจาก "เครื่องมือเฉพาะงาน" มาเป็น "ตัวแทนทั่วไป" ที่แสดงความสามารถที่เกิดขึ้นเอง เช่น การคิดวิเคราะห์และการเรียนรู้จากบริบทภายใน

TERMINALbash — 80x24

> Ready. Click "Run" to execute.

Question 1

What is the primary difference between the "Old Paradigm" and the "New Paradigm" of AI?

Moving from cloud computing to local processing.

Moving from task-specific training to centralized pre-training with prompting.

Moving from Python to C++ for model development.

Moving from Decoder-only to Encoder-only architectures.

Question 2

According to Scaling Laws, what three factors fundamentally link to model performance?

Internet speed, RAM size, and CPU cores.

Human annotators, code efficiency, and server location.

Model size (parameters), data volume (tokens), and total computation.

Prompt length, temperature setting, and top-k value.

Challenge: Evaluating Architectural Fitness

Apply your knowledge of model architectures to real-world scenarios.

You are an AI architect tasked with selecting the right foundational approach for two different projects. You must choose between an Encoder-only (like BERT) or a Decoder-only (like GPT) architecture.

Task 1

You are building a system that only needs to classify incoming emails as "Spam" or "Not Spam" based on the entire context of the message. Which architecture is more efficient for this narrow task?

Solution: Encoder-only (e.g., BERT)

Because the task is classification and requires deep, bidirectional understanding of the text without needing to generate new text, an Encoder-only model is highly efficient and appropriate.

Task 2

You are building a creative writing assistant that helps authors brainstorm ideas and write the next paragraph of their story. Which architecture is the modern standard for this?

Solution: Decoder-only (e.g., GPT/Llama)

This task requires open-ended text generation. Decoder-only models are designed specifically for auto-regressive next-token prediction, making them the standard for generative AI applications.